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Abstract — This paper addresses the optimal scaling of the 
ADMM method for distributed quadratic programming. Scaled 
ADMM iterations are first derived for generic equality- 
constrained quadratic problems and then applied to a class 
of distributed quadratic problems. In this setting, the scaling 
corresponds to the step-size and the edge-weights of the 
underlying communication graph. We optimize the convergence 
factor of the algorithm with respect to the step-size and graph 
edge-weights. Explicit analytical expressions for the optimal 
convergence factor and the optimal step-size are derived. 
Numerical simulations illustrate our results. 



I. Introduction 

Recently, a number of applications have triggered a strong 
interest in distributed algorithms for large-scale quadratic 
programming. These applications include multi-agent sys- 
tems [1], [2], distributed model predictive control [3], [4], 
and state estimation in networks [5], to name a few. As these 
systems become larger and their complexity increases, more 
efficient algorithms are required. It has been argued that the 
alternating direction method of multipliers (ADMM) is a par- 
ticularly powerful and efficient approach [6], One attractive 
feature of ADMM is that it is guaranteed to converge for all 
(positive) values of its step-size parameter [6]. This contrasts 
many alternative techniques, such as dual decomposition, 
where mistuning of the step-size for the gradient updates 
can render the iterations unstable. 

The ADMM method has been observed to converge fast in 
many applications [6]-[9] and for certain classes of problems 
it has a guaranteed linear rate of convergence [10]— [12]. 
However, the solution times are sensitive to the choice 
of the step-size parameter, and the ADMM iterations can 
converge (much) slower than the standard gradient algorithm 
if the parameter is poorly tuned. In practice, the ADMM 
algorithm is tuned empirically for each specific application. 
In particular, for distributed quadratic programming, [7]- 
[9] report various rules of thumb for picking the step-size. 
However, a thorough analysis and design of optimal step-size 
and scaling rules for the ADMM algorithm is still missing 
in the literature. The aim of this paper is to close this gap 
for a class of disttibuted quadratic programming problems. 
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We first consider a particular class of equality-constrained 
quadratic programming problems and derive the correspond- 
ing iterations for the ADMM method. The iterations are 
shown to be linear and the corresponding eigenvalues are 
characterized as roots of quadratic polynomials. These re- 
sults are then used to develop optimally scaled ADMM 
iterations for a class of disttibuted quadratic programming 
problems that appear in power network state-estimation ap- 
plications [13]. In this class of problems, a number of agents 
collaborate with neighbors in a graph to minimize a convex 
objective function with a specific sparsity structure over a 
mix of shared and private variables. We show that quadratic 
programming problems with this structure can be reduced 
to an equality constrained convex quadratic programming 
problem in terms of private variables only. The ADMM iter- 
ations for this quadratic problem are then formulated taking 
into account the communication network consttaints. The 
network-constrained scaling of the ADMM method includes 
the step-size and edge weights of the communication graph. 
Methods to minimize the convergence factor by optimal 
scaling of the ADMM iterations are proposed for generic 
connected graphs. In particular, analytical expressions for 
the optimal step-size and convergence factor are derived 
in terms of the spectral properties of the communication 
graph. A tight lower-bound for the convergence factor is also 
obtained. Finally, given that the optimal step-size is chosen, 
we propose methods to further minimize the convergence 
factor by optimizing the edge weights. 

The outline of this paper is as follows. Section UTI gives an 
elementary background to the ADMM method. The ADMM 
iterations for a class of equality-constrained quadratic pro- 
gramming problems are formulated and analyzed in Sec- 
tion [HI] Distributed quadratic programming and optimal 
networked-constrained scaling of the ADMM algorithm are 
addressed in Section [TV] Numerical examples illusttating our 
results and comparing them to state-of-the art techniques are 
presented in Section [V] Finally, a discussion and outlook on 
future research concludes the paper. 

A. Notation 

We denote the set of real and complex numbers with R, 
and C, respectively. For a given matrix A 6 R nXm , denote 
TZ{A) = {y e R™| y = Ax, x G R m } as its range-space 
and let N{A) = {x G R m | Ax = 0} be the null-space of 
A. For A with full-column rank, define A* = (A J ' A)~ 1 A T 
as the pseudo-inverse of A and n^^) = AA^ as the 
orthogonal projector onto 1Z(A). Since TZ(A) and Af(A T ) 



are orthogonal complements, we have n j*j-(a~* 



i-n 



11(A) 



and TItz(a)^jV(a t ) = 0. Now consider A,De R nx ™, 
with D invertible. The generalized eigenvalues of (A, D) 
are defined as the values A G C such that (A — XD)v = 
holds for some nonzero vector v € C". Additionally, A >- 
(A >z 0) denotes that A is positive definite (semi-definite). 

Consider the sequence {x k } converging to the fixed-point 
x*. The convergence factor of the converging sequence is 
defined as [14] 

IUfc+i _ «*n 



sup 

x k-£ x * 



(1) 



Definitions from graph theory are now presented [15]. Let 
G(V,£, W) be a connected undirected graph with vertex set 
V with n vertices, edge set £ with m edges, and edge-weights 
W. Each vertex i G V represents an agent, and an edge 
e k = (hj) € £ means that agents i and j can exchange 
information. Letting w Ck > be the weight of ek, the 
edge-weight matrix is defined as W = diag([u> ei . . . w em }). 
Denote Mi = {j ^ i\{h.j) € £ } as the neighbor set of node 
i. Define A as the span of real symmetric matrices, S n , 
with sparsity pattern induced by Q, A = {S G S n \Sij — 
if i 7^ j and $ £}. The adjacency matrix A G A 

is defined as Ay = my for G £ and An = 0. The 



the step-size p that minimizes the convergence factor (HJ for 
some particular classes of problems. 

III. ADMM FOR A CLASS OF EQUALITY-CONSTRAINED 
QUADRATIC PROGRAMMING PROBLEMS 

In this section, we develop scaled ADMM iterations for 
a particular class of equality-constrained convex quadratic 
programming problems. In terms of the standard formulation 
(0, these problems have f(x) = ^x T Qx + q T x with Q >~ 
and q G R™, g{z) = 0, and h = 0. 

An important difference compared to the standard ADMM 
iterations described in the previous section is the introduction 
of a matrix R G R rx P scaling the equality constraints 



R{Ex + Fz) = 0. 



(5) 



The underlying assumption on the choice of R is that all 
non-zero vectors v = Ex + Fz, Vx G R™, z G R m do 
not belong to the null-space of R. In other words, after the 
transformation (]5) the feasible set in (O remains unchanged. 
Taking into account the transformation in (|5), the penalty 
term in the augmented Lagrangian becomes 



1 



diagonal degree matrix D is given by Da 



(Ex + Fz) T pR T R(Ex + Fz). 



(6) 



(2) 



j G e.i and By = otherwise. 

II. The ADMM method 
The ADMM algorithm solves problems of the form 
minimize f(x) + g(z) 

X, z 

subject to Ex + Fz — h = 

where / and g are convex functions, x G R™, z G R m , 
h G R p . Moreover, E G R px " and F G R pxm have full- 
column rank; see [6] for a detailed review. The method is 
based on the augmented Lagrangian 

L p {x, z, /j) = f(x) + g(z) + (p/2)\\Ex + Fz - h\\ 2 2 (3) 
+ p T (Ex + Fz-h) 

and performs sequential minimization of the x and z vari- 
ables, followed by a dual variable update: 

x k+1 = a,rgmmL p (x,z k ,p k ) 

X 

z k+1 = argminLp(x fc+1 , z, p, k ) 

Z 

p(Ex k+1 + Fz k+1 -h). 



(4) 



M fc+1 = p k 



These iterations indicate that the method is particularly useful 
when the x- and z-minimizations can be carried out effi- 
ciently (e.g. admit closed-form expressions). One advantage 
of the method is that there is only one single algorithm 
parameter, p, and under rather mild conditions, the method 
can be shown to converge for all values of the parameter; see, 
e.g., [6]. However, p has a direct impact on the convergence 
speed of the algorithm, and inadequate tuning of this pa- 
rameter may render the method very slow. In the remaining 
parts of this paper, we will derive explicit expressions for 



Definition 1: pR T R is called the scaling of the aug- 
mented Lagrangian (01. 

Our aim is to find the optimal scaling that minimizes the 
convergence factor of the corresponding ADMM iterations. 
Specifically, introducing E = RE and F = RF, the scaled 
ADMM iterations read 



= (Q + pE T E)- 1 (-q - pE T (Fz k + u k )) 



(7) 



= ^(F T F)~ 1 F T (Ex k+1 + u k ) 
u k+1 =u k + Ex k+1 +Fz k+ \ 

where u k — p, k j p. From the z- and u-iterations we observe 

u k+i = ( u k + Ex k+1 ) - F(F T F)~ 1 F T (Ex k+1 + u k ) 
= Ii mP T ) (Ex k+1 +u k ). 

Since JV(F T ) and 1Z{F) are orthogonal complements, then 
we have Ii n ^u k — for all k, which results in 



Fz k = -U K{P) Ex k . 

By induction the u-iterations can be rewritten as 

/k+i \ 



(8) 



(9) 



Supposing u° = 0, without loss of generality, and given (O 
and (O, the rr-iterations can be rewritten as 

x k+1 = (Q + p^E)- 1 (-q + pE T n niP) Ex k ) 

k 

- (Q + P E T E)- 1 pE T n J ^ ( p T) 

i=l 



or in matrix form as 



- x k+l- 




"Mil 


M 12 " 




x k 


x k 




I 










(10) 



with 

Mn 



p(Q + pE^EY 1 ^ 1 (n 
M12 = -piQ + p^E^E 1 



K(F) 



n 



M{F T ) 



E + I, 



(11) 

The convergence properties of the ADMM iterations are 
characterized by the spectral properties of the matrix M. 
In particular, denote {fa} as the ordered eigenvalues of M 



so that |0i 



< 



< 



\fan-s\ < \fan-s+l\ = 



\fan\ 



for s > 1. The ADMM iterations converge to the optimal 
solution if fa n = ■ ■ ■ = fan-s+i = 1 and the respective 
convergence factor (Q3 corresponds to fa = \fa n -s\- 

Below we state the main problem to be addressed in the 
remainder of this paper. 

Problem 1: What are the optimal scalar p* and matrix R* 
in the scaling pR T R that minimize the convergence factor 
of the ADMM algorithm? 

As the initial step to tackle Problem Q] in what follows 
we characterize the eigenvalues fa of M. Let [w T v T ] be 
an eigenvector of M associated with the eigenvalue cf>, from 
which we conclude fa) = u. Thus the following holds for 
the eigenvalues and corresponding eigenvectors of M 



fav 



fa\l\lV + M12V. 



(12) 



Our analysis will be simplified by picking R such that 
E T E — nQ for some n > 0. The following lemma indicates 
that such an R can always be found. 

Lemma 1: For E £ ppxn w j tn full-column rank and 
k > 0, there exists an R that does not change the constraint 
set in (ffj) and ensures that E T E = kQ. 

Proof: The proof is derived in the appendix. ■ 

Now, replacing E'E = kQ in ( fTTb we have 

P 



Mn = 



1 



pn 
P 



n(F) 



n AA(F T )J E 



I. 



1 



pK 



■(E T E)~ 1 E T H. 



E, 



The next result presents the explicit form of the eigenval- 
ues of M in ( [Tol l. 

Theorem 1: Consider the ADMM iterations dTDI . If 
E T E = kQ, the eigenvalues of M are described by 



20 = {f(p)\ + 1) ± V (f(p)X + 1) - 2/(p)(A + 1), 



(13) 



with 



f(p) = 



A = 



1 + pK 

v T (E T 



n 



n(F) 



v T {E T E)v 

K = v T [E r E)v. 
Proof: The result follows from (fTZt and E T E 



From (fT3l one directly sees how p and R affect the 
eigenvalues of M. Specifically, f(p) is a function of p, 
while A only depends on R. In the next section we address 
and solve Problem Q] for a particular class of problems. The 
analysis follows by applying Theorem [T] and studying the 
properties of ([T3T i with respect to p and A. 

IV. ADMM FOR DISTRIBUTED QUADRATIC 
PROGRAMMING 

We are now ready to develop optimal scalings for the 
ADMM iterations for distributed quadratic programming. 
Specifically, we will consider a scenario where n agents 
collaborate to minimize an objective function on the form 



minimize 

v 



q rj, 



(14) 



where 77 = [r/J ... ijj i] s ] T and r\i e R ,li represents the 
private decisions of the agent i, rj s e R is a shared decision 
among all agents, and Q has the structure 



Q = 



3n 
Q22 





Qis Q2 

91 ... 







Qsi 

Qs2 



Qns Qs 



(15) 



(16) 



Here, Q ss € R for simplicity, Qu >- 0, and Q S i = Qj s £ 
R ni . Such structured cost functions are common in optimiza- 
tion for interconnected systems. For instance, the problem 
of state estimation in electric power networks [13] gives 
rise to such sparsity structure. Given that i] s is scalar, state 
estimation for an electric power network with the physical 
structure depicted in Fig. |l(a)| results in such structured Q. 





(a) Coupling graph. 



(b) Communication graph. 



Fig. 1. The cost coupling resulting in Q as outlined in (T5J. In (a) each 
agent i ^ s represents a large area of the power network, while node 
s corresponds to the connection point between all the areas. In (b) the 
agents from different areas need to jointly minimize ( 1141 constrained by the 
communication network. 

The optimization problem is almost decoupled, except for 
the shared variable ?y s , and can be solved in a distributed 
fashion by introducing copies of the shared variable x^ ^ = 
ijs to each agent and solving the optimization problem 



minimize 
subject to 



C (M) 



x (i,s)) 



with 



A 1 


Vi 


T 


Qii Qis 




m 


2 






Q si C%iQ ss 




■E(i,s) 





T 


Vi 






x (i,s)_ 



where a, > indicates how the cost associated with rj s is 
distributed among the the copies Xu g\ with J27=i a i = 1- 

Since the private variables rji are unconstrained, one 
can solve for them analytically with respect to the shared 
variables, yielding 

fi( x (i,s)) — 2 X Ji,s)Q iX (i,s) + 9» x (i,s)i 
Qi (Qss&i QisQa Q si) ; 

q. L = (q s a. t - QsiQ^Qi)- 

When Q is positive definite, there exist a set {oti} such that 
each fi(xu s )) is convex, as stated in the following result. 

Lemma 2: For Q y 0, there exist {ai} such that 
SILi a i — 1 an d Qi > for all i = 1, . . . , n. 

Proof: See the appendix. ■ 

Hence the optimization problem can be rewritten as 

minimize fi(xr is) ) 

(17) 

subject to = Vi, j, i ^ j 

which reduces to an agreement, or consensus, problem on 
the shared variable x s between all the nodes i ^ s depicted 
in Fig. M 

Each agent i holds a local copy of the shared variable 
x i = x (i,s) an d it on ly coordinates with its neighbors A/i to 
compute the network-wide optimal solution to the agreement 
problem dl7V 

The constraints imposed by the graph can be formulated in 
different ways, for instance by assigning auxiliary variables 
to each edge or node [2]. The former is illustrated next. 

A. Enforcing agreement with edge variables 

Constraints must be imposed on the distributed problem 
so that consensus is achieved. One such way is to enforce 
all pairs of nodes connected by an edge to have the same 
value, for all £ £■ To include this constraint 

in the ADMM formulation, the auxiliary variable zuj\ is 



created for each edge with 
problem is formulated as 

2 SigV fi{ x i) 



'(id) 



and the 



minimize 

{xi},{z (iJ) } 

subject to 



Z(i,j), Vi e V, V(i,i) € £ 
z (hj) = Z CM)' v (*> j) e £■ 
Consider an arbitrary direction for each edge e% G £. Now 
decompose the incidence matrix as B = Bj + Bo, where 
[Bj]ij = 1 ([Bo]ij = 1) if, and only if, node j is the head 
(tail) of the edge = (j, k). The optimization problem can 
be rewritten as 

Qx 

\R\ (18) 



minimize 



1t T ' 

2 ^ 



subject to 



RB Q 
RB l 



R 



where Q = diag([Qi . . . Q n }), q T = [qi . . . q n ], and W = 
R T R is the non-negative diagonal matrix corresponding to 
the edge-weights. 

Assumption 1: The graph Q(V,£,W) is connected. 

As derived in the previous section, the ADMM iterations 
can be written in matrix form as (fTOt . Since Hfi(p) = 2W, 
E T ~Rn(FjE = \{B[ - 
and E^E = B^WB 



Mix 



BoVWiBj + Bo) = \{D + A) 
- BjWBi = D, we have 

piQ + pD^A + I 



\{Q + P D)- l {D + A). 



(19) 



The main result in this paper is stated below and, for given 
W = R T R, it explicitly characterizes the optimal p solving 
Problem [T] and the corresponding convergence factor of (ED 
with Mu an d M12 derived in ( [T9| >. 

Theorem 2: Suppose W >z is chosen so that Q is 
connected and D = kQ for n > 0. Let {A^} be the set 
of ordered generalized eigenvalues of (A, D) for which 
Ai < ■ ■ ■ < A„ = 1. The optimal step-size p* that minimizes 
the convergence factor (f>* is 




2 = 0, 



Furthermore, the corresponding convergence factor is 

Proof: The proof is presented in the appendix. ■ 
Note that, for a given W, the optimal p* and convergence 
factor |02n— 1 1 are parameterized by k and A n _i. Moreover, 
it is easy to see that |02n-i| > \ and minimizing A n _i leads 
to the minimum convergence factor. Hence, by finding W* 
as the edge-weights minimizing A n _i, the optimal scaling is 
then given by p* (X^ l _ 1 )W* . The optimal choice of W* is 
described in the following section. 

B. Optimal network-constrained scaling 

Here we address the second part of Problem Q] by com- 
puting the optimal scaling matrix R* that, together with 
p*, provides the optimal scaling minimizing the ADMM 
convergence factor. But first we introduce a transformation 
to relax the assumption that D = kQ. The constraints in the 
agreement problem (fT8l enforce x = l n y for some y £ R, 
where 1„ € R™ is a vector with all entries equal to 1. 
Therefore the optimization problem is equivalent to 

minimize ±yl^Ql n y - q T l n y. (20) 

The next result readily follows. 

Lemma 3: Consider the optimization problem (fTFt . For 
given diagonal D y 0, the optimal solution to ( TT~8b remains 
unchanged when Q is replaced by if k — \tq\" ■ 

Proof: The proof follows directly from converting (ED 
to © and having ljQl„ = iljDl n . ■ 
Thus the constraint D = kQ can be achieved for any D y 
by modifying the original problem (fTST l by replacing Q 
with —D and let k = " . Below we show how the 



minimization of A n _i with respect to W can be formulated, 
where the adjacency matrix A is determined by the edge- 
weights W and graph-induced sparsity pattern A. 

Theorem 3: Consider the weighted graph Q = (V, £, W) 
and assume there exist non-negative edge-weights W = 
{wij} such that Q is connected. The non-negative edge- 
weights {wij} minimizing the second largest generalized 
eigenvalue of (A,D), A n _i, while having Q connected 
are obtained from the optimal solution to the quasi-convex 
problem 

minimize A 

{»«}. A 

subject to Wij > 0, Vi,jE V, 
Aij = Wij, G £, 

Aij=0, V(m)0£, (2i) 
D = diag(Al„), 
D y el, 

A-D- l n ll ■< 0, 
P T (A -XD)P < 0, 

where the columns of P G R" x,l_1 form an orthonormal 
basis of TV(l^) and e > 0. 

Proof: The proof is in the appendix. ■ 
Given the results derived in this section, the optimal 
scaling p*W* solving Problem [T] can be computed as sum- 
marized in Algorithm Q] 



Algorithm 1 Optimal Network-Constrained Scaling 

1) Compute W* and the corresponding D* and A*_ a 
according to Theorem |3j 

2) Given D* and Q, compute n* from Lemma [3j 

3) Given k* and A*_ x , compute the optimal step-size 
as described in Theorem [2] 

4) The optimal scaling for the ADMM algorithm with Q 
replaced by -\D* is p*W* . 



with n = 3, a = ± [0.5 0.9 1.6], Qi = 0.5507, Q 2 = 
0.0667, Q 3 = 0.2232, q± = -0.3116, q 2 = -0.3667, 
and (73 = —0.1623. As for the communication graph, we 
consider a line graph with node 2 connected to nodes 1 and 
3. Algorithm Q] is applied, resulting in X* l _ 1 = with the 
edge weights w ei = w e2 = 0.1566 and degree matrix D = 
diag([0.1566 0.3132 0.1566]). From Theorem we then 

have p* = - = and (f>* = \(f>2n— 1| = 0.5, which 

is the best achievable convergence factor. The performance 
of the ADMM algorithm with optimal network-constrained 
scaling is presented in Fig. [2] The performance the unsealed 
ADMM algorithm with unitary edge weights and manually 
optimized step-size p is also depicted for comparison. The 
convergence factor of the manually tuned ADMM algorithm 
is |02n— 1| = 0.557, thus exhibiting worse performance than 
the optimally scaled algorithm as depicted in Fig. |2] 




5 10 15 20 25 30 35 40 45 50 
no. iterations 



Fig. 2. Normalized error for the scaled ADMM algorithm with W*, and 
p* obtained from Algorithm [TJ and the unsealed ADMM algorithm with 
unitary edge weights and manually selected best step-size p = 0.55 via 
exhaustive search. 



V. Numerical examples 
Next we illustrate our results in numerical examples. 

A. Distributed quadratic programming 

Consider a distributed quadratic programming problem 
with n — 3 agents and an objective function defined by 
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As shown previously, the optimization problem can be 
reformulated on the form 

minimize J2i=i \ x Ji s )Qi x (i,s) + q] ~%(i,s) 

subject to = x {]:S) , Vi ^ j 



B. Distributed consensus 

In this section we apply our methodology to derive opti- 
mally scaled ADMM iterations for the average consensus 
problems and compare our convergence factors with the 
state-of-the art fast consensus algorithm presented in [2]. 
The average consensus problem is a particular case of ([To! 
where x G R, Q — al for some a G R, and q = 0. As an 
indicator of the performance, we compute the convergence 
factors for the two methods on a large number of randomly 
generated Erdos-Renyi graphs. Fig. [3] presents Monte Carlo 
simulations of the convergence factors versus the number of 
nodes n E [5,20]. Each component in the adjacency 
matrix A is non-zero with probability p = (1 + e) log ^"- ) , 
where e G (0, 1) and n is the number of vertices. In our 
simulations, we consider two scenarios: sparse graphs with 
e = 0.2 and dense topologies e = 0.8. For every network 
size, 50 network instances are generated, the convergence 
factors are computed and averaged to generate the depicted 
results. The figure shows two versions of Algorithm Q] with 
and without weight optimization in Theorem [3] We observe 
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Fig. 3. Performance comparison of the proposed optimal scaling for the 
ADMM algorithm with state-of-the-art fast-consensus [2]. The network of 
size n = [5, 20] is randomly generated by Erdos-Renyi graphs with low 
and high densities e = {0.2, 0.8}. 



a significant improvement compared to the state-of-the-art 
fast consensus [2] in both sparse and dense topologies. 



VI. Conclusions and future work 

Optimal scaling of the ADMM method for distributed 
quadratic programming was addressed. In particular, a class 
of distributed quadratic problems were cast as equality- 
constrained quadratic problems, to which the scaled ADMM 
method is applied. For this class of problems, the network- 
constrained scaling corresponds to the usual step-size con- 
stant and the edge weights of the communication graph. 
Under mild assumptions on the communication graph, an- 
alytical expressions for the optimal convergence factor and 
the optimal step-size were derived in terms of the spectral 
properties of the graph. Supposing the optimal step-size is 
chosen, the convergence factor is further minimized by opti- 
mally choosing the edge weights. Our results were illustrated 
in numerical examples and significant performance improve- 
ments over state-of-the art techniques were demonstrated. As 
a future work, we plan to extend the results to a broader class 
of distributed quadratic problems. 
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Appendix 

A. Proof of Lemma Q] 

Let Q = RqRq and choose R = ^kRq(E t E)~ 1 E t . 
Then we have E T E = E T R T RE 
E T kE(E t E)- 1 Q(E T E)- 1 E T E = kQ. For the second 
part of the proof, note that the full-rank assumption on 
E, and F implies that x = —E^Fz in 10, where f 
denotes the pseudo-inverse. Consider the scaled version 
of the constrain R(Ex + Fz) = with the choice 
R = ^R Q {E T E)- 1 E T . Repl acing R in aforementioned 
yields ^/kRqx + t/kRq(E t E)^ 1 E T F z = 0. It 
indicates x = -(^/kRq)- 1 ^/kRq(E t E)- 1 E t Fz = 
-(E T E)- 1 E T Fz = -E^Fz. 

B. Proof of Lemma \2\ 

Let Ti = QisQi^Qsi an d nc, t e that if Q >~ then Q ss — 

QisQu^Qsi = Qss -_Er=i T i > °> as seen b y takm g 
the Schur complement of Q with respect to Q ss . For i — 

2, . . . , n and e > 0, let Q ss <^i — Ti = e and define a, = 

(Ti + c)Qss- All that remains is to compute ct\ such that 



Y^i=i Oii = 1 and Aa.\ — T\ > 0. Using the former equation 
we have a% + J^ILaC^ + e )Q7s = 1> anc ^ subtracting 
TiQ~g to both sides it can be rewritten as Q ss ai — T\ = 
Qss—J2i=i Ti~{N —l)e. Since e is arbitrary, we let Q ss ai — 
T\ = e and so we have e = -^(Q ss — J27=i ^i) > 0' which 
concludes the proof. 

C. Proof of Theorem \2\ 

We are interested in minimizing the convergence factor 
of (01, which is expressed in terms of the magnitude of 
eigenvalues of M, {<fri}. From Theorem [TJ the eigenvalues 
of M are given by 



2</>(p,A) =i + a/( p )± ^i + a 2 /(p) 2 

= 2<j> r (p~X)+ 3 2<j> c (p, A), 



2/(p) 



(22) 



with f(p) = ji 



> 0, and A = 



v Av 



Recalling D = 



k<5, in the sequel we select v £ R™ such that v T Dv = k. 
Taking into account this scaling and considering JT9] l we 
conclude A = k^ x v t Av. Some properties of A follow. 

Lemma 4: Let v T Dv — k for n > and define kX = 
v T Av. Then the following holds 



— 1 < Ai = min ■ 



v Av 



< A < 



Av 



An — 1) 



1 Dv v v T Dv 

where A; is the z-th generalized eigenvalue of the matrix 
pencil (A,D), ordered as A„ > ••• > Ai > ••• > Ai. 
Proof: The proof comes from Assumption [TJ Specifically, 
if Q is connected with W >r it is well-known from graph 
theory [15] that D y 0, D - A h 0, and D + A t 0. 
Therefore we have — v T Dv < v T Av < v T Dv. One can di- 
vide former inequalities by v T Dv and conclude the claimed 
bounds, since D >- 0. Moreover, from (D — A)l n = 
we conclude that D — A is singular and thus A„ = 1 is 
the largest generalized eigenvalue of (A, D), i.e., the largest 
value for which we have \ A— X n D\ = 0. Similarly, Ai = —1 
if D + A is singular, which is not necessarily implied by 
Assumption [T] 

■ 

We have seen that the largest eigenvalue of M in magnitude 
is equal to 1 and it corresponds to the fixed-point of the 
ADMM algorithm ( TT8l >. So we discard it and focus on its 
second largest eigenvalue in magnitude. The following result 
characterizes the second largest eigenvalue of M. 

Lemma 5: The magnitude of the second largest eigenvalue 
of M, \4>2n-\\i is given by the following equation 



> 2 „ i | = max <; max \<j>(p, \)\ , —— 

,A6[Ai, A„_i] 1 + pK 



(23) 



Proof: Recall that all the eigenvalues of M satisfy ( l22l . 
Consider v = al n with a g R + then k — v T Dv = 
a^lnDX^. Hence d22b becomes 



0(p,a 2 CAl„)^{l,-f^} 

1 + PK 



pK 



(24) 



But (j> = 1 is the simple maximum eigenvalue of M and 
we discard it. Still the second term of (l24l might be the 
magnitude of the second largest eigenvalue of M. Another 



possibility for the maximum magnitude of 4>2n-i is when 
we have v orthogonal to 1„ in the definition of A; i.e., 

max \<fi(p, —v T Av)\, 

jj,lj«=0 K 

Since 1„ is also the eigenvector associated with the largest 
generalized eigenvalue of (A, D), A„, the optimization can 
be performed over A = v T Av/k. As a result we have 

max |<Hp,A)|, 

A£[Ai, A ra _i] 

where A is bounded above by A n _i as requiring l n v = 
excludes the largest generalized eigenvalue A„ = 1. ■ 
The optimal step-size p* that minimizes |</>2n-i| is 

p* = argmin max{ max \<t>(p, A)|, -— — }. (25) 
p Ae[Ai,A„_i] l + pn 



In the following we focus on the first term of 
characterize (p*, A*) as the solution to 



and 



(p*, A*) = argmin argmax \(f>(p,\)\. (26) 

P Ae[Ai,A„_i] 

The next result characterizes the behavior of \<p(p, A) | with 
respect to p for the case where the value of (l22t is complex, 
i.e., c (p,A) ^0._ 

Lemma 6: Let A G [Ai, A n _i] and 4> c (p> A) ^ 0. Then 
\4>{Pi A)| is monotonically increasing with respect to p. 
Proof: Recall from Lemma |4] that [Ai, A n _i] C [—1, 1). 

For c (p,A) ^ 0, |^(p,A)| = The derivative of 

\4>(p, A) | with respect to p is 



1 2(1 +pn) pX + pn 



pX + pn 2(1 + pnf 



> 



since |A| < 1, which proves that \(j)(p, A)| is monotonically 
increasing for </> c (p, A) ^ 0. ■ 
We now analyze the monotonicity of \<fi(p, v)\ with respect 
to A. 

Lemma 7: Let A G [Ai, A n _i] and (f> c (p,X) ^ 0. Then 
\(p(p, A)| is monotonically increasing with respect to A. 

Proof: For cj> c {p,\) ^ 0, \4>{p,X)\ = ^fg^g. The 
derivative of \<p(p, A)| with respect to A is 



v x Wp,a)| = 51 



'2(1 + pn) up 
pn + KpX 2(1 + pn) 



Kp 



2(1 + pn) V 1 + A 



1 



>0, 



since |A| < 1. ■ 
For <j) c 7^ 0, Lemma |6] indicates that p should be as 
small as possible, while Lemma [7] indicates that the optimal 
A corresponds to A n _i ((A, D)). Therefore, since for A = 
A„_i we have that \<j) c \ is monotonically increasing with p, 
the second largest eigenvalue of M under the optimal p* is 
real and c (p*, A*) = 0. The following lemmas address this 
case and characterize the derivatives of \<f>(p, A)| with respect 
to p and A when cf>(p, A) is real, i.e., <j) c (p, A) = 0. 



Lemma 8: Let A £ [Ai, A n _i] and </> c (p, A) — 0. Then 
\4>{p, A)| is monotonically decreasing with respect to p. 
Proof: When </> c (p, A) — 0, we have \<j>(p, A)| = |</> r (p, A)| 
and 



2|0(p, A)| = 1 + Xf( P ) ±^1 + X 2 f(p) 2 - 2/(p), 

where kX = v T Av, and w_Ll„. Moreover, since |A| < 1 we 
have 



2\4>(p, A)| = 1 + A/(p) + ^1 + AV(p) 2 - 2/(p). 

Let g(p, A) = 2|(^>(p, A)|. Recall that we are interested in the 
values of Ai < A < A n _i < 1. Taking the derivative of 
g(p, A) with respect to p yields 



Similarly, the following result describes the optimal p for 
a given A. 

Lemma 11: For given A € [Ai, A n _i] we have 

p* = argmin|0(p, A)| = — 1 

p K-v/l - A 2 

Proof: The proof follows from the monotonicity prop- 
erties of \4>(p, A) | in Lemma|6]and Lemma[S] which indicate 
that the optimal p yields y/l + X 2 f(p) 2 - 2f(p) = with 

M = t+^k" ■ 

The proof of Theorem [2] follows from the previous results 
and is now presented. 

Proof: [Proof of Theorem [Z) The proof comes 
from d25l l. Lemma QT| and Lemma [10] First suppose that 



V P .g = f'(p) A + (1 + A 2 / 2 (p) - 2f(p))-*(X 2 f(p) - 1) . > and define p 



Given the monotonicity 



Since f'(p) = (i + K pK yi > 0, we can further simplify the 
above derivative and check its negativity: 

V p g < e> A + (1 + X 2 f(p) - 2f{p))- 1 *{X 2 f{p) - 1) < 0. 

By replacing f(p) in the second term of the right hand side 
inequality we have 



A-(l + AV(p)-2/00)-» 



_i /l + p(l-A 2 



(a) 

< A 



(1 - K- 



1 + pn 
, 1 + P(1-A 2 ) 



1 + pn \ 1 + pn 
= -(l-X + p(l-X 2 )) <0. 

where in (a) we have replaced A < 1 with its upper bound 
in the inverse square root term. ■ 
Lemma 9: Let A € [Ai, A n _i] and (f> c (p,X) = 0. Then 
\4>{p, A) | is monotonically increasing with respect to A if 
and only if either of the following holds: 

1) A > 0; 

2) A < and pn < 1. 

Proof: Recall that for <fi c (p, A) = we have 



2\<j>(p, A)| = 1 + Xf( P ) +s Jl + X 2 f(p) 2 - 2f(p). 
Defining g(p, A) = 2\cj)(p, A)| we have 



Vx.9 = f(j>) + V 1 + X2 f(P) 2 - 2 f(P) ( X f(p) )• 

For A > it follows that V x .9 > f(p) > 0. 

Supposing that A < 0, we see that V^g > is equivalent 
to having y/1 + X 2 f(p) 2 - 2f(p) > \X\f(p). Taking the 
square of both sides of the latter inequality we conclude that 
Vx3 > for A < holds if and only if pn < 1. ■ 

The former results are now used to characterize the 
optimal A that maximizes \<fi(p, A)|. 

Lemma 10: Let A € [Ai, A n _i]. For a given p we have 

A* = argmax \(/>(p, A)| 

Ae[Ai,A„_i] 

A„_i , if A„_i > 
A„_i , if A„_i < and p < 1/k 
Ai , if A„_i < and p > 1/k 



properties of \4>(p, A)| for A G [Ai, A„_i] and the fact that 
|^(p„_l,A„_i)| > \<p(p n -!, Xi)\ for any A, < A_ n _i, we 
have that (p„_i,A n _i) is a saddle-point of \cj>(p, A)| when 
An-i > 0. Moreover we have 



\(p(Pn-i, A„-i)| = i f 1 + — 1 



1-A 2 _i. 



Pn-1« 



1 + l+xA-A 2 ./ 



f(Pn-l) 



and so the second largest eigenvalue of M given 
by (|25]) is max{|0(p„_i, A„_i)|, f(p n -i)}- Observing that 
A„_i)| - /(pn-i) > holds for 1 > A„_i > 
concludes the first part of the proof. 

Now suppose that Ai < A n _i < 0. From Lemma Q~T] 
the optimal p for a given A is greater than 1/k. Therefore 



the pair (pi,Ai) with pt 



is a saddle-point 



of \4>{p, A)|, since p\K > 1 for Ai ^ and given the 
monotonicity properties in Lemma [10] However, note that 
\4>(pi, Ai)| — /(pi) < for Ai < 0, and hence the second 
largest eigenvalue of M is governed by /(pi) in (f25t . 
Furthermore /(p) is monotone increasing with respect to 
p; i.e., we have /(pi) > /(p) for all p < pi. From 
this and Lemma QT| we conclude that p* = 1/k as the 
intersection point of braces belonging to the negative Ai and 
A„_i is the minimizer of |02n-i|- In fact, p* = 1/k yields 
\cj>{p\ Ai)| = • • • = \<f>{p*, X n -i)\ = f(p*) = 1/2 and it is 
the optimal step-size when Ai < A„_i < 0. ■ 

D. Proof of Theorem \3\ 

The first inequality constraint ensures the non-negativity 
of the edge-weights, while the equality constraints merely 
construct the adjacency and degree matrices A and D, 
respectively, and D y el ensures the problem is not nu- 
merically ill-conditioned. Therefore the main elements of the 
optimization problem are the last two inequality constraints. 

First we show that the second-last inequality constraint 
ensures the graph is connected. From graph theory we have 
that A — D is negative semi-definite for non-negative edge- 
weights. Moreover, A — D has a single zero eigenvalue if, 



and only if, the respective graph Q is connected, and the 
corresponding eigenvector is 1„. Hence Q is connected if 
and only if A — D — l n l„ is negative definite. 

The final part of the proof shows that, for connected 
graphs and P e pnxn-i being an orthonormal basis of 
N{lZ), P T (A - XD) P -< holds if and only if A > A„_i. 
Consider the matrix pencil (A, D) = A — XD for A e R 
and let {A^} be the ordered set of generalized eigenvalues 
of (A,D) so that Ai < •• • < A„. Recall that A„ = 1 has 
1„ as the corresponding eigenvector and that for A > A„ 
we have A — XD -< 0. Additionally, for A„ > A > A n _i 
the matrix A — XD has one non-negative eigenvalue and 
v T (A-XD)v < if and only if v g TZ(l n ). Defining P as an 
orthonormal basis for A/"(lJ), we have y T P T (A-XD)Py < 
0, Vy € R™~ 1 , since lJ-Py = 0. Hence we conclude that 
P T (A - XD)P < if and only if A > A n _i. 



